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Abstract —This study relates the local property of node dom¬ 
inance to local and global properties of a network. Iterative 
removal of dominated nodes yields a distributed algorithm for 
computing a core-periphery decomposition of a social network, 
where nodes in the network core are seen to be essential in 
terms of network flow and global structure. Additionally, the 
connected components in the periphery give information about 
the community structure of the network, aiding in community de¬ 
tection. A number of explicit results are derived, relating the core 
and periphery to network flow, community structure and global 
network structure, which are corroborated by observational 
results. The method is illustrated using a real world network 
(DBLP co-authorship network), with ground-truth communities. 

Index Terms —Core-periphery, community detection, simplicial 
collapse, topological data analysis, social network. 


1. Introduction 

O NE of the interesting challenges in social networks is to 
relate local connectivity properties to global structure. 
The motivation for doing do stems from the belief that local 
properties reflect interactions amongst individuals (or entities). 
Therefore such relationships help us make inferences about the 
nature of interactions which led to the network, by studying its 
global properties. In this paper, we present the local property 
of node dominance as a method for network analysis. We 
will show why node dominance is such a useful criterion, 
by developing a low complexity, distributed algorithm for the 
core-periphery decomposition of a network based on node 
dominance criteria. We will also demonstrate its relation to 
the network community structure. 

Owing to a localized deflnition, the node dominance criteria 
for a node v can be determined only from a two hop neigh¬ 
borhood. A node v is dominated by node w if all nodes that 
share and edge with v, also share an edge with w. The formal 
deflnition of node dominance is based on a simplicial complex 
(as opposed to graph) structure, and will be discussed in detail 
later. If we iteratively collapse dominated nodes, the resulting 
set (the network core) is shown to consist of nodes that are 
important with respect to the network flow, community struc¬ 
ture, and global network structure. One especially important 
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property of the core is the preservation of shortest distances, 
so a shortest path between any two nodes in the core is also 
a shortest path between them in the original network. The 
network periphery (the complement to the core, consisting 
of dominated nodes) is seen to consist of many connected 
components, including all the nodes in the network through 
which no shortest paths pass. These peripheral components 
also play a key role in the community structure of the network. 

The intuitive notion that a network naturally decomposes 
into a core and periphery has appeared many times in the 
social network literature over the decades. Researchers have 
proposed different interpretations about what such a decom¬ 
position should look like, but it is commonly suggested that 
a ‘core’ should be central to the network (with respect to 
information flow, or shortest paths) Q, have high average 
degree 12, and be relatively well-connected both internally, 
and to the periphery 12 ii. In contrast, the periphery should 
be connected to the core, but extremely sparsely connected 
amongst itself. 

Borgatti and Everett O were the first to attempt to an¬ 
alytically describe these intuitive properties. They proposed 
an ‘idealized core-periphery’, wherein every core node is 
connected to every other core node, each peripheral node is 
connected to the core, and no peripheral nodes are connected 
to each other. They would then learn the core-periphery 
structure for a given network by assigning each node as 
‘core’ or ‘periphery’ in the way that best correlated with this 
idealized structure. This method assumes explicitly that the 
probability of two nodes being joined by an edge is only 
a function of their ‘core-ness’, as opposed to some other 
characteristics, such as community membership. In this sense, 
the core-periphery model considered in Q is in contrast 
to common network models based on community structure. 
Both core-periphery and community network structures can be 
expressed using a stochastic blockmodel approach O, but with 
different parameters, so under these models a given network 
will not display both structures simultaneously. 

Another approach, by Rombach et al 0 presents a gen¬ 
eralization of Borgatti and Everett’s philosophy, where a core 
score is computed for each node, using a range of possible 
core sizes and continuous/discrete transitions between core 
and periphery. Here, they admit that both core-periphery 
and community structure are often present in real-world net¬ 
works, but still propose the core-periphery decomposition as 
an alternative/complementary analysis to the more common 
community detection methods. In Della Rossa et al O, an 
approach to periphery detection based on random walks is 
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taken, where is it assumed that due to the extremely sparse 
connectivity of the periphery, a random walk will exit the set 
of peripheral nodes very quickly. Thus, a core-periphery profile 
for the network, along with a coreness value for each node, 
is computed using a greedy algorithm that incrementally adds 
nodes to the periphery in a way that minimizes the expected 
exit time of a random walk. Again, this method focuses very 
heavily on the sparsity of the periphery, and is somewhat 
unrelated to any community structure that may be present in 
the network. For a good review of existing methods of core¬ 
periphery network decomposition, see the survey by Csermely 
et ah O, or the introductory sections in 0. 

Traditionally, approaches to community detection in net¬ 
works have assumed that communities form a partition of the 
network, with each node belonging to exactly one community. 
A foundational method has been the Girvan-Newman algo¬ 
rithm 0, where communities are detected though iterative 
removal of edges with high betweenness centrality. They 
defined the notion of ‘modularity’ as a stopping criterion for 
their algorithm, and many subsequent algorithms attempt to 
partition a network in such a way that optimizes (usually 
approximately) modularity m, or cut ratio (approximated 
using spectral clustering) ||9l. Fortunato provides an excellent 
overview of the breadth and depth of approaches to the 
community detection problem in his 100 page survey paper 
ca. In more recent years, researchers are determining that 
partition-based methods are often somewhat unrealistic, since 
real-world networks with ground-truth communities typically 
display overlapping community structure ifTTI . where one node 
may have multiple community memberships. See Xie et al. 
ca for a survey of methods for overlapping community 
detection, including clique percolation, link clustering, and 
fuzzy detection methods using mixed-membership stochastic 
block models, or nonnegative matrix factorization. 

A particularly realistic model for overlapping community 
detection is Yang and Leskovec’s community-affiliation graph 
model (AGM) ifTSll ifTH . This model considers communities 
as ‘overlapping tiles’, and its distinguishing feature is that 
regions of community overlaps are more densely connected 
than regions involving single communities. Precisely, the prob¬ 
ability of an edge existing between two vertices is based on 
the communities they share, with higher probability when 
they have more community memberships in common. This 
assumption is validated on data sets with ground-truth com¬ 
munity memberships available, where higher edge densities 
are observed in community intersections (Hi . AGM, and the 
Other methods for overlapping community detection are more 
realistic than the partition-based methods, but they do not scale 
up well with size of the network. A recent relaxation of AGM, 
referred to as Cluster Affiliation Model for Big Networks 
(BIGCLAM) ini, allows nodes to have continuous-valued 
community memberships, indicating their degree of involve¬ 
ment in a given community. This reduces the combinatorial 
optimization in AGM to a continuous optimization that can 
be solved using nonnegative matrix factorization, making it 
viable for large networks. We will return to these models in 
Section HV^ 

In the current paper, we will see how a core-periphery struc¬ 


ture and a community structure are both present in real-world 
networks, and how node dominance informs us about both. 
The relationship between the core-periphery and community 
structure of a network has been touched upon previously by 
Leskovec et al. ca, where they also noted the presence of 
a network periphery, defined in terms of whiskers (clusters of 
nodes that are separable from the main network by removing 
a single edge), which were interpreted as small communities, 
weakly connected to the remaining network “core”. In the 
AGM model mentioned above oai, Yang and Leskovec refer 
to the overlapping portions of communities as the “core” of the 
network. We will see that this interpretation does in fact concur 
with our notion of core and periphery, where in networks 
with ground-truth communities available, the nodes in the 
core obtained using node dominance typically have multiple 
community memberships, while the nodes in the periphery 
have fewer community memberships (often just one). 

Iterative node dominance collapses were originally proposed 
independently by Wilkerson et al. ifTTl and Barmak and 
Minian ca, as a homology/homotopy-preserving simplifi¬ 
cation of a simplicial complex, with the distributed version 
described in ca. Here, we explore much more deeply the 
use of this simplification as a network core, and describe the 
relationship between the core-periphery decomposition, and 
the community structure, global structure, and network fiow 
properties. 

In Section [IIJ we will first describe the relevant information 
for the simplicial complex representation of a network, and the 
background and definition of the node dominance criterion. 
We follow this in Section [I^ by statements and derivations of 
the resulting properties of core-periphery decomposition, and 
present an algorithm for the use of peripheral components in 
community detection. In Section |lvj we illustrate our method 
with two real-world network data sets which contain ground- 
truth community information. We not only empirically verify 
the importance of core nodes with respect to network fiow 
and global structure, but see that our propose d use of the 
peripheral components for community detection outperforms 
BIGCLAM, which is considered the current state-of-the-art 
method for overlapping community detection in large net¬ 
works. Finally, in Section |V] we draw some conclusions, 
and discuss the limitations of our method, as well as some 
directions for future research. 

11. Background 
A. Simplicial homology 

A graph G = G{V, E) is defined by a list, V, of its vertices, 
as well as a list, E, of the pairs of vertices that are joined 
by an edge. An implicit assumption in this is that an edge 
e = (vi^Vj) G E can only be present in G if both of its 
vertices Vi and vj are in V. The notion of a simplicial complex 
is a higher-order generalization of a graph, while similarly 
preserving this ‘closed under subsets’ property. 

Definition (Simplicial complex). A /c-simplex a = 
{vq^vi^ ... ^Vk) is a set of {k 1) singleton elements (called 
vertices). A simplicial complex K is a set of simplices (i.e. a 
set of sets of vertices) such that 
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(i) if (J^T G K, then a Dr e K 

(ii) if r < a, then r e K 

where < indicates the subset relation. If r < cr, we call r a 
face of (7. 

A simplex cf is maximal if there are no r G such that a < 
T. A /c-simplex has dimension k. The dimension of simplicial 
complex K is the maximum dimension of any simplex in K 

dim(if) = maxdim(cr). 

aEK 

A subset K' of a simplicial complex K is called a subcomplex, 
if K' is itself a simplicial complex (satisfying properties (i) 
and (ii) above). The /c-skeleton of K is the subcomplex formed 
by all simplices in K with dimension at most k 

/c-skeleton of K = {a e K \ dim (a) < k} 

Definition. Let Ki and K 2 be two simplicial complexes 
with vertex sets Vi and V 2 . A map 0o • ki ^ V 2 on the 
vertex sets induces a simplicial map f : Ki ^ K 2 on the 
complexes, if for every simplex a = {vq^ ... ^Vk) G Ki, the 
set (0o('^o): • • •: 00'^/c) spans a simplex in K 2 . A simplicial 
map (j) \ Ki ^ K 2 induced by an isomorphic map on the 
vertex sets is said to be an isomorphic simplicial map, and in 
this case, Ki and K 2 are isomorphic simplicial complexes. 


See, for example, for a more mathematically complete 
definition of simplicial homology. 

Intuitively, the dimension of the k-\h homology space 
counts the number of /c-dimensional “holes” in the simplicial 
complex. These can be thought of as {k A 1)-dimensional 
voids enclosed by /c-simplices, so Hi counts the number of 
loops which are not “filled-in” by triangles, and H 2 counts the 
number of voids. The interpretation of Hq is slightly different: 
it counts the number of connected components of X (which 
may be interpreted as cycles of dimension zero). 

The sequence of homology spaces of a simplicial complex, 
in essence, specify the ’’global structure” of the complex. For 
our purposes, we will not be computing any homology directly, 
but we will see that by preserving homology during our node 
dominance collapse, we will in fact be preserving important 
global structure of the network. 

B. Node dominance 

We will be representing a network using its flag complex, 
and in that setting, node dominance is characterized by the 
following deflnition. 

Definition. The neighbor set of a node v, is the set of all 
nodes sharing an edge with v, as well as v itself: 


In Section III this isomorphism between complexes will be 
used to describe the uniqueness of the core obtained using 
node dominance collapsing. 

Given a graph G = G{V,E), we can think of G as the 1- 
skeleton of a simplicial complex, whose higher-dimensional 
simplices have not been directly observed. The maximal 
simplicial complex whose 1-skeleton is equal to G is called 
the flag complex. 


Af[v] := {u eV \ {u, 1 ;) G U { 1 ;}. 

A node v is dominated by one of its neighbors w, if and only 
if M[v\ C Af[w] i.e., all the neighbors ofv are also neighbors 
ofw. 

To understand the importance and relevance of this deflni¬ 
tion, we will explore a bit of its history, and related concepts. 
1) Homology of a relation: 


Definition (Flag complex). Given a graph G = G{V, E), the 
simplicial complex 

X{G) = {a = I 

{vi- e E for all 0 < j^k < dim cr} 

contains a simplex cf whenever all pairs of vertices in cf are 
connected by an edge in E. X{G) is called the flag complex 
of G. 

As we will see in Section |II-B1| if we have additional 
information about the /c-tuple relations in G, we may build a 
simplicial complex using that information, adding /^-simplex 
cr whenever its vertices satisfy a /c-tuple relation, and all 
faces of the simplex are also present. In the absence of such 
information, when only the graph G is given, we propose the 
use of the flag complex, and see that it can be very informative. 

A flnal notion we will mention here is the deflnition of the 
homology of a simplicial complex. 

Definition (Homology). We encode the structure of simpli¬ 
cial complex X through boundary maps where 

dk gives the oriented connectivity information between k- 
simplices and {k — 1)-simplices. Then the k-th homology group 
of X is 

Hk{X) =keT{dk)/ im(a/c+i) 


Definition. A relation on two sets A and B is a function r : 
A X B ^ {OA}’ kVe say that elements a^, Oj G A are related 
(through element b) if there exists an element b G B such that 
r{aiA) = 1 ^nd r{aj, b) = 1. Similarly, bi^ bj G B are related 
if there exists an a £ A such that r{aAi) = 1 ^nd r{aAj) = 
1. For A and B finite, the relation r can be represented by an 
|A| X \B\ binary matrix R = (r^j), where Vij = r{aiAj)- 

As an example, the elements of set A could be actors, and 
the elements of set B could be movies, with r(a, b) = 1 
whenever actor a appears in movie b. 

Given a relation, there are two ways to encode its structure 
as a simplicial complex. The first way, which we will denote 
as Xr{A,B), the elements of A are represented as vertices, 
and vertices {a^Q, ..., are spanned by a /^-simplex 

whenever there exists a 6 G 5 such that r(ai^,6) = 1 for 
all I = 0,1,..., /c. The second way, which we will denote 
as X]i{B,A), the elements of B are represented as vertices, 
and {bj^Aji^ • • • 5 similarly spanned by a /^-simplex 

whenever they are all related by the same a e A. Note also that 
for any simplicial complex X (even if it wasn’t constructed 
using a relation) one may form its dual complex X, by letting 
each maximal simplex in X correspond to a vertex in A. In 
that case, a set of vertices in X are spanned by a simplex if 
their associated simplices in X all had a vertex in common. 
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In the example with actors and movies, this means that 
we can represent their relationships by building a simplicial 
complex where actors are vertices, and simplices are formed 
between actors who are in the same movie; or alternatively, 
we can encode it by using movies as vertices and spanning 
a set of movies by a simplex when they all feature the same 
actor. 

Note that these two simplicial complexes may have drasti¬ 
cally different structure (different number of vertices, different 
dimension), but Dowker 1(211 proved that the two complexes 
have exactly the same homology (in the sense that the 
homology groups of the two complexes are isomorphic, for 
all k). 

Theorem II. 1 (Dowker). If R is a relation on sets A 
and B, with associated simplicial complexes Xr{A^B) and 
Xr{B,A), then 

Hu{Xn{A, B)) ^ Hu{Xn{B, A)) for all k 

2) Node dominance and equivalent notions: In light of the 
dual simplicial complexes presented in Section |II-B1| we can 
now give the more general definition of node dominance. 

Definition (Node dominance). Given simplicial complex X 
and its dual complex X, each vertex v ^ X has an associated 
simplex cfy G X. We say a vertex v is dominated by vertex 
w, if ay is a face of aw. This occurs exactly when the set of 
simplices incident to (i.e. containing) v is a subset of the set 
of simplices incident to w (in X). 

When the simplicial complex of interest is a flag complex, 
we know that the presence of a higher dimensional simplex 
is determined by the presence of its constituent edges. This is 
why we are able to check the node dominance criterion using 
only the neighbor sets of our vertices, in the flag complex 
setting: if the neighbors of v are all neighbors of w, then the 
set of simplices incident to is a subset of the set of simplices 
incident to w. 

To illustrate the concept of node dominance using the 
example of actors and movies, consider two actors, represented 
by separate vertices and aj in Xr{A,B). If the movies 
featuring actor is a (proper) subset of the movies featuring 
actor Oj (i.e. is dominated by aj), then in the dual complex 
Xr{B, A), the simplex aa^ will be a (proper) face of simplex 
aaj. Thus, removing actor (and all its incident simplices) 
completely, will not change the simplicial structure of the 
dual complex Xr{B,A) at all, and thus will not change the 
homology of the original complex Xr{A^ B). 

The insight that removing dominated nodes does not change 
the homology of the simplicial complex, suggests an algo¬ 
rithm, as originally proposed (independently) by nil and mi, 
to simplify a simplicial complex by iteratively removing such 
vertices. In the work by Barmak and Minian (TSl, they term 
the removal of a dominated node a strong homotopy collapse, 
node dominance is a stricter condition than that required for 
a regular homotopy-preserving simplicial collapse 1221. 

In Figure vertex v is dominated by vertex w, where vertex 
w could have additional connections in the network which 
are not shown. The removal of vertex v does not create or 


destroy any connected components, loops, or voids (preserves 
homology), and does not affect shortest path lengths between 
other nodes (see Section [ni-A| ). 



Fig. 1. Node v, dominated by node w. Removal of v only has local effects. 

One more definition we will note is that of a 2-hop neighbor 
set, which is the neighbor set of a node that also contains all 
“friends of friends”, instead of just immediate neighbors: 

X'2[v\ — {u \ (u,v) ^ E, or (u,Vi) G E for some vi G A/’[t;]} 

Performing the node dominance collapse using the 2-hop neighbor 
set can allow greater collapsability in networks with few dominated 
nodes. It also allows small holes in the flag complex (i.e. those 
with hop length < 6) to be “fllled in”, so only larger homological 
features are preserved. We will use this version of the node dominance 
collapse on one of the data sets in Section [IV| 

3) Distributed algorithm for flag Comdexes: Assuming a flag 
complex structure, the node dominance collapse can be performed 
referring only to its 1-skeleton (the original graph under analysis). 
Moveover, the criterion for determining node dominance requires 
only local information, making the algorithm of distributed nature. 
This algorithm was first presented in GU. 

Each node v has the list of its neighbor set N[v\, and it then 
executes the following steps during each iteration: 


Distributed algorithm for node dominance collapse 

Broadcast Af[v] to neighbors 
for Vi G Af[v] ,Vi V 
Receive J\f[vi] 
if JV[vi] C J\f[v] 

Broadcast OFF to vi 
if OFF received from Vi 

Handshake to determine if v or Vi turns off 

end if 
end if 
end for 

if OFF received OR Handshake determined v turns off 
V designated OFF 
else 

Update Af[v], omitting OFF neighbors 


A very similar distributed algorithm is also possible in the non¬ 
flag complex setting, where there exists some a priori information 
about which /c-tuples of simplices are related. An example of this 
would be the list of movies and actors, or some other relation 
(eg. authors/papers). In that case three actors (vertices) are only 
spanned by a triangle when there is a single movie they all appeared 
in together, not only if they had all appeared in movies together 
pairwise, as in the flag complex case. To compute node dominance 
in that setting, we only need to assume that each node has access 
to its list of maximal simplices (eg. an actor has its movie list, an 
author has its paper list, etc.). Then the algorithm above can proceed 
exactly as written, with Af[v] replaced by the maximal simplex list 
of V. 

III. Properties oe core and periphery 

In this section, we will outline both the analytical and empirically 
observed properties of the core-periphery decomposition obtained 
through the iterative node dominance collapse. Examples of the 
observed properties on real-world data sets are presented in Section 

nvAi 
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Analytical properties: 

1) Shortest paths in the core are shortest paths in the original 
network. (Network flow) 

2) Nodes with betweenness centrality zero are not in the core 
(Network flow) 

3) A node is more likely to be dominated by a node sharing the 
community membership(s) of its neighborhood set, compared 
to a node which does not. (Community structure) 

4) The homology of the flag complex of the core is the same as 
the homology of the flag complex of the entire network (Global 
structure ) 

5) The structure of the core is unique (all possible cores for a 
given network are isomorphic as simplicial complexes) ( Global 
structure) 

Observed properties: 

• Core nodes typically have high degree and high betweenness 
centrality. ‘Hub’ nodes are in the core. (Network flow) 

• Nodes with multiple ground-truth community membership la¬ 
bels tend to be in the core, while nodes with just one (or 
no) community labels are usually in the periphery. ( Community 
structure) 

• Using the peripheral groups, we can obtain candidate sets 
that are seen to contain a lar ge proportion of ground-truth 
communities. See Section |rV-C| for details, and our use of these 
candidate sets for community detection. (Community structure) 

• The core is stable with respect to the order of collapses in the 
iterative algorithm. (Global structure) 

Throughout this section, for a graph G = G{V^ E), the core Gc — 
G(Vc, Ec) is the graph induced by the set of nodes Vb C U which 
remain upon an iterative and total removal of dominated nodes from 
V. Note that the set Vc (and thus the core itself) is not necessarily 
unique, becau se of a potential random ‘handshake’ in the Algorithm 
from Section |II-B3| The statements given below are valid for any 
core obtained by the procedure of iter ative n ode dominance collapse. 
As we will discuss further in Section [ni-C| below, all possible cores 
obtained from the same initial graph have the exact same structure 
(are isomorphic) ES. 


A. Network flow 

The properties in this subsection involve statements about shortest 
paths between given nodes in the network. An outline of a proof 
similar to Property |III. 1 1 is given in im, and we include the proof 
here for completeness. 

Definition (Shortest paths). Given a graph G' — G{V, E), for any 
pair of points Vj, C V, a path p = (vi = ui, U 2 ,..., JJ 

is a sequence of vertices such that {vk,Vk-\-i) G E for all k = 
1,— 1. The path has length \p\ = I, and p is a shortest path if 
^ < \p'\ far any other path p' from vi to Vj. The set of all shortest 
paths from Vi to vj, in the graph G' is denoted SPcAvi^Vj). 

Property III.l (Shortest paths in the core are shortest paths in the 
original network.). For vi,V 2 G Vc, if P C aSPcc ('^i?'^ 2 ), then 
p G SPg{vi,V2). 

Proof: For any graph G\ let Vj be dominated by its neigh¬ 
bor Vi. Consider any shortest path p = (..., ...) 

passing through Vj. Note that kfl / i [Proof by contradic¬ 
tion: p = {... ,Vi,Vj,vi,...) could be replaced by shorter path 
{... ,Vi,vi,...), since JV[vj] C JV[vi] so vi G A/’[uj] ^ vi e 
A/’[ui]]. So p = {... ,Vk,Vj,vi,...) can be replaced by p = 
{... ,Vk,Vi,vi,...), which is the same length as p, but doesn’t 
contain Vj. 

Therefore, the length of all shortest paths in G' (where vj is not the 
source or destination) are preserved when vj is removed. ■ 

*Note that there is no loss of generality by using indices 1,2,... ,1 


Definition (Betweenness centrality). The betweenness centrality of a 
node V is defined as the proportion of shortest paths between nodes 
s and t that pass through v, summed over all pairs i.e.) 

UpC,,'! - |{p € Gp}\ 

^ ^ \SPG{s,t)\ 

s,t^v 

Property III.2 (If the size of the core is greater than Q nodes with 
betweenness centrality zero are not in the core). 

bc(u) = 0 ^ u 0 Uc 

Proof: Using the deflnition of betweenness centrality above, we 
can see that 

hc(v) = 0 ^ |{p G SPg{s, t)\v G p}| = 0 Vs, t v. 

Therefore, either 

(i) deg(i;) = 1 

(ii) Vs, t, e J^[v], (s, t) E E (so that ... ... will not be in 

any shortest path) 

If (i), then v is dominated. 

If (ii), then J\f[v] is a clique, so for any w G J\f[v] with w ^ v, 
Af[v] C J\f[w]. This implies v is dominated by all its neighbors. In 
this case, either v is removed and therefore in the periphery, or all 
its neighbors are removed and v is the only node in the core. Since 
we assume that the size of the core is greater than 1, u ^ Vc- ■ 

Both of these properties speak to the ‘centrality’ of the n odes in 
the core, with respect to the original network. Property |III. 1 1 tells us 
that there is no way to shortcut through the peri phery when traveling 
between two nodes in the core, and Property |III.2| says the nodes 
that are not involved in any shortest paths are guaranteed to be 
contained in the periphery. Together, we can conclude that the node 
dominance collapse only has local effects (with respect to shortest 
paths in the network), in that only shortest paths beginning or ending 
at the dominated node are affected. 

Empirically, we see that nodes with high betweeness cen trality 
and nodes with high degree will lie in the core (see Section |IV-A| 
for concrete examples). These are ‘hub’ nodes, in terms of network 
flow properties, so removal of nodes in the core have a much greater 
impact on network information flow than removal of nodes from the 
periphery. 

B. Community structure 

The community affiliation graph model (AGM) proposed by Yang 
and Leskovec US assumes that the probability of an edge forming 
between two nodes depends on the community membership(s) of the 
nodes under consideration. This is similar to the traditional stochastic 
blockmodel (which require communities to form a partition of the 
network), or generalizations mi of the stochastic blockmodel that 
allow for overlapping communities, with the notable exception that 
under AGM the edge density in the intersections of communities 
is higher than the edge density in the non-overlapping portions of 
communities. 

For notation, consider the set G = {ck}^=i deflning the m 
communities in the network, where Ck is the set of nodes belonging 
to the community. Note that each node in V may belong to 
zero, one, or multiple communities. For two nodes u,v G U, 
let Guv = {c G G \ u,v G c} denote the set of communities 
containing both u and v. We will also use the more general notation 
Gs = {c C G I 3v C S s.t. u G c} to denote the set of community 
memberships for nodes in a given set S. Under AGM, an edge forms 
between u and u, independently, with probability pc for each of the 
communities c G Guv In other words, denoting the probability of an 
edge between u and v by p{u, v) = P[{u, v) C E], we have 

p{u,v) = 1 - H -Pc)- (1) 

ceCu V 

*In practice, this assumption is almost always satisfied. 
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Further, Yang and Leskovec define a baseline edge probability 
£ — p(u^v) for u^v with no communities in common. They choose 
^ — ivilivi-i) ’ which is typically a number of orders of magnitude 
smaller than the pc probabilities. For the proof of the following 
result, we assume the AGM model for network community structure, 
however the result would still hold for any model that bases the prob¬ 
ability of an edge between two nodes on the community membership 
of the nodes, where the probability of an edge is significantly higher 
for nodes sharing communities than nodes not sharing communities. 


Property III.3 (A node is more likely to be dominated by a 
node sharing the community membership(s) of its neighborhood set, 
compared to a node which does not.). In other words, v is dominated 
by w with much higher probability when C Cw as compared 

to the case when % Cw 

Proof: The probability that v is dominated by w is 


P[v dom. by w] 


P{w,vi) 

Vi£j\f[v] 

\ 

n ^ 


n 

1- H (i-pc) 

Vi 

i 


In other words, v will be dominated by w, only if there 
exist edges between w and all Vi G A/’[t;]. Each of these 
edges occurs independently, with probability p{w,Vi), with 
the value given in Equation Q if ic and Vi share community 
membership(s) (i.e. if C^vi 7^ 0). Sind p{w,Vi) = 5 otherwise. 
Since e Pk for all k, 

P[{w,Vi) e E I 7^ 0] > P[{w,Vi) e E I = 0] 
Therefore 

P[v dom. by w \ CV[v] ^ > 

P[v dominated by w \ g Cw] 


In real world networks (as described in Section |IV-A| ), nodes in 
the periphery typically have one (or no) community membership(s), 
while nodes in the core have multiple community me mbers hips, and 
lie in the intersections of communities. In Section |IV-C| we will 
take this interpretation further, by proposing a method for using the 
peripheral components to obtain candidate sets which are likely to 
contain communities of the network. We can think of the peripheral 
components as the non-overlapping portions of the communities, 
in which case the true network communities would consist of a 
peripheral component, along with adjoining nodes in the core. It is 
also possible that a single community could have non-overlapping 
portions which “stick out” from the core in multiple places, on 
account of which we propose a method of combining peripheral 
components according to which core nodes they connect to. This 
yields an algorithm for obtaining “candidate sets” which are intended 
to contain the true network communities. This method is discussed 
further in Section IIV-CI 

C Global structure 

As described in Section |IFB] when the flag complex representa¬ 
tions of the original network and the core network are used, the core is 
seen to have the exact same homology as the original complex, in the 
sense that their homology spaces are isomorphic in all dimensions. 

Property III.4 (Homology is preserved in the core). 

Hk{X{Cc)) ^ Hu{X{C)) for all k 


Proof: This property follows immediately from the result of 
Dowker’s Theorem (that a simplicial complex and its dual complex 
have the same homology), combined with the observation that if a 
vertex is dominated, its corresponding simplex in the dual complex 
will be a face of the simplex corresponding to the dominating node, 
and thus will not contribute to the structure of the dual complex. 

An alternative formulation and pr oof is available in (H. ■ 

A corollary of Property |III.1| is that at least one shortest cycle 
for each homology class is retained in the core. Thus, not only is the 
dimension of each homology space preserved, but the ‘hole locations’ 
in the network are also preserved. It is this additional property that 
truly allows us to interpret the core as the global scaffolding for the 

network. _ _ 

Property |III.4| together with Property |III.3 tell us that nodes with 
diverse friend sets (including bridging ties) will be in the core. If they 
are not, it is only because they are dominated by another node with 
all the same diverse connections. In real-world networks, we see that 
the average clustering coefficient for nodes in the co re is much lower 
than in the network as a whole (see Section [rV-A| ), which supports 
the ‘diverse friend set’ interpretation, because the friends of a core 
node are usually not friends with each other. 


IV. Analysis of real-world networks 

We will use two data sets in this section as a running illustration, 
both obtained from the Stanford SNAP network database 1^ . The 
first is a coauthorship network built from the DBLP computer 
science bibliography, and the second is a co-purchasing network 
from Amazon. The networks were originally analyzed by Yang and 
Leskovec im in one of the first papers to systematically analyze 
the properties of ground-truth communities (abbreviated in figures 
as GTCs) in real-world networks. Both communities have ground- 
truth community labels: 13,477 ground-truth community labels in 
DBLP, defined as connected components of authors within the same 
publication venue; and 271,570 ground-truth community labels in 
Amazon, defined using product categories. Additionally, Yang and 
Leskovec labeled 5000 of the communities in each data set as 
“best” in terms of having community-like properties such as low 
conductance or high triangle-participation ratio. We computed the 
core-periphery decomposition for both networks using t he itera tive 
node dominance collapse algorithm described in Section |II-B3| For 
the Amazon co-purchasing network, the periphery consisted of 70716 
nodes (accounting for only 21% of the nodes in the network), each 
of which were singletons, connected only to the core and not to 
other peripheral nodes. To allow further collapse, we re-computed 
the co re using the 2-hop neighbor sets M 2 [v\ described in Section 
|II-B2| This yielded 193,195 nodes in the periphery (57.7% of the 
nodes in the network), with 70716 peripheral components, of which 
20136 were non-singletons (of varying sizes). All analysis presented 
below uses the regular node dominance collapse on the DBLP data 
set, and the node dominance collapse based on 2-hop neighbor sets 
for the Amazon data set. 

Descriptive statistics for the networks, as well as for their asso¬ 
ciated core-periphery partitions, are presented in Table |IV] For the 
computations of average degree and clustering coefficient, the values 
were computed with respect to the entire network, and again with 
respect to the induced subgraph under consideration (either the core 
or periphery). 

To verify the stability of the core under multiple realizations of 
the node dominance collapse algorithm, we performed the following 
randomization: For one realization of the iterated node dominance 
collapse, we would compute the set of dominated nodes, pick one 
at random to collapse, add the newly dominated nodes to the set of 
dominated nodes, randomly pick the next dominated node to collapse, 
and so on. After performing 100 realizations of the core-periphery 
decomposition on the two data sets, we found that 99.58% (DBLP) 
and 99.43% (Amazon) of the nodes in the core were present in the 
core on every realization. The set of nodes that appeared in the 
core on some (but not all) realizations was 0.89% (DBLP) 1.24% 
(Amazon) the size of the core. Thus, not only is the shape of the 
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TABLE I 

Descriptive statistics eor the DBLP and Amazon data sets, and 

THEIR CORE-PERIPHERY DECOMPOSITIONS. 



DBLP 

Amazon 

Nodes in core: 

71,018 

141,688 

Nodes in periphery: 

246,062 

193,195 

Nodes (total): 

317,080 

334,863 

Edges within core: 

318,741 

347,527 

Edges within periphery: 

274,367 

218,237 

Edges between core and periphery: 

456,758 

360,108 

Edges (total): 

1,049,866 

925,872 

Mean degree: 

Entire network 

6.62 

5.53 

Core (w.r.t entire network) 

15.41 

7.45 

Core (w.r.t. core) 

8.98 

4.91 

Periphery (w.r.t entire network) 

4.09 

4.12 

Periphery (w.r.t periphery) 

2.23 

2.26 

Clustering coefficient: 

Entire network 

0.632 

0.397 

Core (w.r.t entire network) 

0.285 

0.219 

Core (w.r.t. core) 

0.255 

0.182 

Periphery (w.r.t entire network) 

0.733 

0.527 

Periphery (w.r.t periphery) 

0.385 

0.293 

Communities (total): 

Number 

13,477 

271,570 

Average size 

53.41 

11.67 

Standard deviation of size 

257.58 

273.66 

Communities (best): 

Number 

5000 

5000 

Average size 

22.45 

13.49 

Standard deviation of size 

201.08 

17.52 


core unique, but the actual nodes composing it are very stable in 
these real-world data sets. 


A. Relationship of core-periphery to network structure 

For both data sets, we observe (Table [rv] ) that nodes in the core 
have higher degree than nodes in the periphery, with the difference 
especially pronounced in the DBLP network. Additionally, nodes in 
the core have lower clustering coefficient, which corroborates our 
intuition that core nodes have “diverse friend sets”, so their friends 
are not all friends with each other. Along with their high degree, this 
is also interpretable as having reach outside of their local community. 

Scatterplots showing the natural logarithm of betweenness central¬ 
ity versus node degree are shown in Figure]^ with the two plots of the 
same data alternating whether core or periphery is plotted on top , to 
help display the region of overlap. As mentioned in Section [Hi-A| all 
nodes with betweenness centrality of zero (i.e. nodes through which 
no shortest paths pass) are guaranteed to be in the periphery, and we 
observe that additionally, all of the nodes with highest betweenness 
centrality are in the core. For example, in Figure it can be seen 
that in the DBLP data set there is a threshold betweenness centrality 
value (around In (be) = 17), above which all nodes are in the core, 
while in the Amazon data set, it is the nodes with both high degree 
and high betweenness centrality that appear exclusively in the core. 

Figure shows the number of ground-truth community assign¬ 
ments per node in the core and periphery of the DBLP and Amazon 
networks. Out of all the nodes in the periphery, 22.11% had no 
ground-truth community (GTC) membership labels, 57.39% had 
exactly one, and 20.49% had more than one GTC membership label. 
On the other hand, out of the nodes in the core 85.02% had multiple 
GTC membership labels, while 12.65% had a single community, 
and only 2.33% had no GTC label. From another perspective, the 
periphery contained 97.05% of the nodes without a GTC label, 
94.02% of the nodes with a single label, but 45.51% of the nodes 
with multiple labels (however of those nodes multiply labeled, the 
average number of labels was 2.9 in the periphery, but 7.0 in the 
core). A similar behavior is observed in the Amazon network, albeit 
to a lesser extent, and likely due to the average number of labels per 
node being much higher. 


Core on top 




log betweenness centrality 


Core on top Periphery on top 




Fig. 2. Log betweenness centrality vs degree in core and periphery (DBLP- 
top, Amazon-bottom) 



Fig. 3. Number of community memberships for nodes in core and periphery 
(DBLP-top, Amazon-bottom) 

B. Role of core in network flow 

To demonstrate the key role our core nodes play in information 
flow over the network, we computed their contribution to the shortest 
paths of the network. For each network, we randomly chose 1000 
pairs of nodes, and computed shortest paths between them. Since 
100% of these paths contain at least one node from the core, 
we computed the proportion of each path that is in the core. For 
comparison, we chose three sets of nodes, each with the same number 
of nodes as the core: chosen uniformly randomly; using the nodes 
of highest degree; and using the nodes with highest betweenness 
centrality. Then, using the same 1000 shortest paths, we computed 
the proportion of nodes from each path belonging to each of these 
sets. Taking the average over all 1000 paths, the mean proportion 
of each path contained in the four sets (C ore, H ighest BC, Highest 
Degree, and Random) are shown in Table |IV-B| Since betweenness 
centrality measures how many shortest paths pass through a node, 
the nodes with highest betweenness centrality should be the optimal 
choice for this measure (if considering all shortest paths in the entire 
network), so it is not surprising that they have the highest proportion 
of shortest path nodes. What is somewhat more surprising, is that 
for both data sets, the nodes in the core out-perform the nodes with 
highest degree, so a greater proportion of nodes in shortest paths 
belong to the core, than belong to the equal-sized set of highest 
degree nodes. The proportion of nodes in the shortest paths that 





































TABLE II 

Importance of core nodes, high betweenness centrality nodes, 

HIGH DEGREE NODES, AND RANDOMLY CHOSEN NODES, IN SHORTEST 
PATHS OF THE DBLP AND AMAZON NETWORKS 


Proportion of nodes in shortest paths 
belonging to important sets 


DBLP 

Amazon 

Highest BC 

0.785 

0.892 

Core 

0.753 

0.841 

Highest degree 

0.739 

0.698 

Random 

0.222 

0.427 


belong to the Random set give us a baseline probability from which 
to compare the other choices of “important” nodes. Recall also, that 
betweennness centrality is very expensive computationally, requiring 
global information, so it is useful that the distributed core-periphery 
computation be nearly comparable at obtaining nodes central to 
network flow. 


C. Community detection 

The flndings of this study are consistent with the community 
affiliation graph model (AGM) of Yang and Leskovec 03, C3, 
in the sense that it supports an overlapping community model 
for social and information networks where the probability of an 
edge between two nodes is related to their common community 
membership(s), with higher probabilities of edges between nodes 
that have multiple communities in common. Under this model, we 
showed that nodes are only dominated (with very high probability) 
by nodes which share their community memberships. Interpreting 
our peripheral components with respect to this model, they appear to 
be the ‘non-overlapping’ parts of communities that stick out of the 
network. Figure]^ shows embeddings of some peripheral components 
from the DBLP data set as examples, where the peripheral component 
is drawn in black, while the core nodes and connecting edges are 
grey. The internal structure and connectivity to the core can vary 
considerably between peripheral components. 





Fig. 4. Example peripheral components. 

In light of the interpretation of peripheral components as non¬ 
overlapping portions of communities, we propose an algorithm which 
consists of taking unions of these peripheral components, along with 
their neighboring nodes in the core, to obtain candidate sets for 
community detection. 

More precisely, let PC = denote the set of peripheral 

components in the network, where each node in the periphery is 
in exactly one peripheral component, pci. Then deflne the extended 
peripheral components PC^ = {pc'l where 

pcf = {v eVc \ ^ Vj e pci s.t. {vj,v) e E} upci, 

SO each extended peripheral component additionally contains all the 
nodes in the core that share an edge with a vertex of the peripheral 
component. The extended peripheral components are meant to ap¬ 
proximate ground-truth communities in the data set, however there 
are large numbers of very small size (such as those consisting of an 
isolated peripheral node and its single neighboring core node). We 
consolidate extended peripheral components into “candidate sets” by 
taking, for each u G Vb, the union of all extended peripheral groups 


that include v. So we obtain {csv}vevc^ where 

CSv = (J pel. 

pcfepc+ 

vEpc'f 

For example, if there were many peripheral nodes connected to 
a single core node (but not connected amongst each other), this 
group would be consolidated into a single candidate set. We then 
remove any candidate sets esy that are repetitions or subsets of other 
candidate sets, to obtain our flnal set of maximal candidate sets: CS. 
Intuitively, our candidate sets are meant to approximate ground truth 
communities, or unions of ground truth communities (that overlap on 
common core nodes). 

To judge the performance of our candidate sets for the purposes 
of community detection, we also ran the BIGCLAM algorithm fBl 
on the DBLP data set. Popular methods for detection overlapping 
communities include clique percolation, link clustering, and fuzzy 
detection methods using mixed-membership stochastic block models 
(see (m for a survey), however none of these methods scale up 
well to networks with hundreds of thousands or millions of nodes. 
The recent exception to this is Yang and Leskovec’s BIGCLAM 
algorithm, which can estimate the overlapping community structure 
for large networks. The BIGCLAM algorithm (available in the SNAP 
C-i-i- package 1^ ) allows the user to input the expected number 
of communities, but runs into memory problems if the number of 
communities is larger than a few hundred. It also has an option for 
the algorithm to learn the appropriate number of communities, with a 
default to test between 5 and 100 communities. Therefore, to obtain 
a set of communities of the same order as the number of ground- 
truth communities (13,477 for the DBLP data set), we performed 
BIGCLAM in a nested manner. First obtaining 100 communities, and 
then further subdividing each of these, where the optimal number of 
subcommunities was most often also 100. This yielded a total of 
9904 detected communities from the BIGCLAM algorithm. We used 
the same method for analysis of the Amazon data set, yielding 8899 
BIGCLAM communities, even though that network has a much larger 
number of ground-truth communities (271,570). For both data sets, 
the number of candidate sets obtained using our method was around 
40,000 (47,134 for DBLP and 37,449 for Amazon). 

To measure the fit of the candidate sets and BIGCLAM communi¬ 
ties to the ground-truth communities, we used precision, recall, and 
average FI score. For a detected community Ci and ground truth 
community C 2 (the target), the precision is the proportion of detected 
nodes that belong to the target: 

precision(Ci,C 2 ) — 

|Ci| 

the recall is the proportion of target nodes captured in the detected 
community: 

lur^ ^ ^ C2I 

reca//(Ci, C2) = — r ^.—, 

IC 2 I 

and the FI-score is the harmonic mean of precision and recall: 

r ) — precision{Ci,C2) • reca//(Ci, C2) 

^ 2 (preciszon(Ci, C2) + reca//(Ci, C2)) ’ 

These three values for a given ground-truth community are obtained 
by maximizing each over all candidate sets (BIGCLAM communi¬ 
ties), and an average precision, recall, and FI-score for the ground- 
truth communities is obtained. Similarly, the three values are obtained 
for each candidate set (BIGCLAM community) by thinking of it as 
the “target” community, and maximizing precision, recall, and Fl- 
score over all ground-truth communities, and then taking the average 
of these maxima. 

Using all three of these values (precision, recall, and FI-score) 
helps offset some of the discrepancies caused by the varying numbers 
of ground-truth communities, candidate sets, and BIGCLAM commu¬ 
nities. Since the matching of ground-truth communities onto detected 
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TABLE III 

Detection oe all ground-truth communities by candidate sets 
AND BIGCLAM COMMUNITIES 



DBLP (all 13,477 communities) 


Candidate sets 

BIGCLAM 


ground-truth detected average 

ground-truth detected average 

Recall 

Precision 

Fl-score 

0.7620 0.5401 0.6511 

0.4319 0.4960 0.4640 

0.4233 0.2565 0.3399 

0.7418 0.4478 0.5948 

0.2366 0.6261 0.4314 

0.2696 0.2721 0.2709 



Amazon (all 271,570 communities) 


Candidate sets 

BIGCLAM 


ground-truth detected average 

ground-truth detected average 

Recall 

Precision 

Fl-score 

0.8481 0.8721 0.8601 

0.2545 0.8728 0.5636 

0.3218 0.4815 0.4017 

0.9213 0.8203 0.8708 

0.1124 0.9861 0.5492 

0.1611 0.4685 0.3148 



DBLP (5000 best communities) 


Candidate sets 

BIGCLAM 


ground-truth detected average 

ground-truth detected average 

Recall 

Precision 

Fl-score 

0.9414 0.2559 0.5987 

0.4313 0.3121 0.3717 

0.5221 0.1446 0.3333 

0.9054 0.2678 0.5866 

0.3065 0.4216 0.3640 

0.3840 0.1913 0.2877 



Amazon (5000 best communities) 


Candidate sets 

BIGCLAM 


ground-truth detected average 

ground-truth detected average 

Recall 

Precision 

Fl-score 

0.9893 0.0222 0.5058 

0.4781 0.0404 0.2593 

0.5753 0.0241 0.2997 

0.9072 0.0728 0.4900 

0.4535 0.1224 0.2880 

0.5100 0.0753 0.2927 


communities, but also the matching of detected communities onto 
ground-truth communities, are considered, having more candidate sets 
than BI GCLAM communities will not necessarily be an advantage. 

Table |IV-C| gives the values for recall, precision and FI-score 
when comparing the ground-truth communities to our candidate sets 
(left three columns), and to the BIGCLAM communities (right three 
columns). The performance using candidate sets and BIGCLAM 
communities are compared for each measure (eg. “ground-truth com¬ 
munity recall”, or “ average precision”), with the values in boldface 
indicating the method (eandidate sets or BIGCLAM) with superior 
performance in that measure. The column “ground-truth” gives the 
average values for the ground truth communities (when maximized 
over the detected communities), and the column “detected” gives the 
average for the detected communities (when maximized over ground- 
truth communities). 

Our candidate sets give better overall community detection perfor¬ 
mance than the BIGCLAM communities (as measured by the average 
Fl-score). For the DBLP data set, the ground-truth communities 
were contained in the candidate sets (based on higher ground- 
truth recall scores), more so than the candidate sets found strongly- 
matching ground-truth communities (although it is worth noting, as 
Yang and Leskovec did, that not all “true” ground-truth communities 
necessarily have ground-truth community labels in this data set). The 
performance on the Amazon data set is quite good, with very high 
ground-truth recall and detected recall and precision for both the 
candidate sets and the BIGCLAM methods, although our candidate 
sets out-performed BIGCLAM in detected recall, as well as ground- 
truth, detected and average FI-scores. 

The analysis was repeated using only the 5000 “best” ground-truth 
communities, and again the candidate sets resulted in higher average 
Fl-scores than the BIGCLAM communities. The main difference 
was that recall for the ground-truth communities increased (on 
average, each ground-truth community had a candidate set it was 
94% contained in), while recall and precision for the candidate sets 
decreased (since there were fewer ground-truth communities to match 
to, fewer detected had a well-matched ground-truth community). It 
is also worth noting that for the DBLP data set 81.7% of the best 
ground-truth communities were completely contained in at least one 
candidate set, while 73.8% of the best ground-truth communities were 
completely contained in at least one BIGCLAM community. For the 


Amazon data set, these values were 94.8% for the candidate sets, and 
82.8% for the BIGCLAM communities. 

The challenge of detecting thousands of overlapping communities 
from a large network is formidable. Currently there are no available 
methods which achieve excellent performance when comparing de¬ 
tected to ground-truth communities. Based on the analysis of two 
large, real-world data sets with ground-truth community informa¬ 
tion, our proposed algorithm of obtaining candidate sets from the 
peripheral components of the core-periphery decomposition, yielded 
more accurate community detection results than the state-of-the-art 
BIGCLAM algorithm for overlapping community detection, with 
much lower complexity and a distributed algorithm. 

V. Conclusion 

This Study posed the question “How does the concept of node 
dominance relate to local and global properties of a network?”. 
Previous work determined that iteratively removing dominated nodes 
is a homology-preserving way to perform a collapse/simplification of 
a simplicial complex |[T^ Cll. This was extended into a distributed 
algorithm for the case of flag complexes CD. Here, we undertook an 
investigation of the theoretical and practical properties of performing 
such a collapse on social and information networks, and discovered 
that it has implications for both a core-periphery decomposition of 
the network, as well as uncovering network community structure. 

The properties of the core and periphery that we developed in 
Section and observed in Section lead to the interpretation 
that nodes in the core obtained using node dominance collapse are 
important with respect to network flow, to the global structure of the 
network, and to the network community structure. 

The core nodes are essential to network flow because of two 
properties: a shortest path between any two points in the core 
is contained in the core; and nodes with betweenness centrality 
zero (through which no shortest paths pass) are never in the core. 
Observationally, ‘hub’ nodes are contained in the core, and core nodes 
often have high degree and high betweenness centrality. 

The global structure of the network is preserved in the core 
because the homology of the core is the same as the homology of 
the entire network, when considering the respective flag complexes. 
This can be interpreted as node dominance collapses only having 
‘local’ effects, and that nodes with diverse neighbor sets (including 
bridging ties) are members of the core, maintaining a scaffolding 
for the global structure of the network. The observation that each 
core node typically has a diverse neighbor set (their friends are not 
all friends with each other) is also quantifled by their relatively low 
clustering coefficient values. 

Finally, the core is related to the community structure of the net¬ 
work because under community membership models where within- 
community connections have signiflcantly higher probability than 
cross-community connections, we see that nodes are dominated (with 
high probability) by nodes that share their community membership(s). 
In real-world networks with overlapping ground-truth community 
labels, this is observed through nodes with multiple community 
memberships typically residing in the core, and through nodes with 
single (or no) community labels occupying the periphery. 

The result relating the core-periphery to the community structure 
of the network gives us an additional application: the use of the 
peripheral components to generate “candidate sets” which are likely 
to contain the true network communities. Many state-of-the-art com¬ 
munity detection algorithms which allow for overlapping communi¬ 
ties, are not scalable past network sizes of a few thousand nodes. 
The notable recent exception is Yang and Leskovec’s BIGCLAM 
algorithm, which our method is shown to outperform on their DBLP 
dataset. 

Implications of this work may be of interest not only to researchers 
explicitly interested in a core-periphery decomposition of complex 
networks, but to anyone studying community structure, or key nodes 
for network flow. Hopefully this work will also serve to further 
popularize the node dominance collapse for use in general contexts 
where data is represented using a simplicial complex structure. 
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One limitation of our method is that some networks don’t collapse 
using node dominance. For example, on Facebook there are very 
few people who have a friend list completely contained in the 
friend list of another person. One option for future research in this 
direction would involve performing the node dominance collapse 
locally on ego networks, and consolidating the resulting communities. 
Another potential drawback is the nondeterministic nature of the node 
dominance collapse algorithm. Perhaps under some circumstances it 
would be wise to consider the set of nodes that are “ever in the core”, 
or “always in the core”, und er repe ated realizations of the algorithm. 
In practice however (Section [IV-A| ), we have seen that these two sets 
are quite similar. 

One other area for future research is in the study of the core under 
a graph evolution. Either using observed or model-generated dynamic 
networks, studying how the core varies over time could be used to 
help evaluate or predict community structure and key players in the 
network. 
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